dd (Unix)

In computing, dd is a common Unix program whose primary purpose is the low-level copying and conversion of raw data. According to the manual page for Version 7 Unix,[1] it will "convert and copy a file". It is used to copy a specified number of bytes or blocks, performing on-the-fly byte order conversions, as well as more esoteric EBCDIC to ASCII conversions.[2] It can also be used to copy regions of raw device files, for example backing up the boot sector of a hard disk, or to read fixed amounts of data from special files like /dev/zero or /dev/random.[3]

The name dd may stand for "data" or "disk duplication".

The syntax of dd is likely inspired from DD found in IBM JCL, and the command's syntax is meant to be reminiscent of this;[4] in JCL, "DD" stands for Data Description.[5] The Jargon File states that it is rumored to have been based on IBM's JCL, and the syntax may have been a joke.[4]

Contents

Usage

The command line syntax of dd is significantly different from most other Unix programs, and because of its ubiquity it is resistant to recent attempts to enforce a common syntax for all command line tools. Generally, dd uses an option=value format, whereas most Unix programs use either -option value or --option=value format. Also, the input is specified using the if (from input file) option, while most programs simply take the name by itself.

Usage varies across different operating systems. Also, certain features of dd will depend on the computer system capabilities, such as dd's ability to implement an option for direct memory access. Sending a SIGINFO signal (or a USR1 signal on Linux) to a running dd process makes it print I/O statistics to standard error and then continue copying. dd can read standard input from the keyboard. When EOF (end of file) is read, dd will exit. Signals and EOF are determined by the software. For example, Unix tools ported to Windows vary as to the EOF: Cygwin <ctrl-d> (the usual, Unix EOF) and MKS Toolkit uses <ctrl-z> (the usual, Windows EOF).

In compliance with the Unix philosophy, dd does one thing well. Unlike a sophisticated and highly abstracted utility, dd has no algorithm other than in the low-level decisions of the user concerning how to vary the run options. Often the options are changed for each run of dd in a multi-step process to solve a computer problem.

Output messages

The GNU variant of dd as supplied with Linux does not describe the format of the messages displayed on standard output on completion, however these are described by other implementations e.g. that with BSD.

Each of the "Records in" and "Records out" lines shows the number of complete blocks transferred + the number of partial blocks, e.g. because the physical medium ended before a complete block was read.

Block size

Block size is a crucial operating factor. Each run of dd will use one set of block sizes. There are block sizes for input and output. Block sizes can adapt dd to the realm of its application, and to the phase of an operation involving many runs of dd. An input block size is ibs, but bs will override ibs. An output block size will depend on obs, and cbs, and sync will pad to comply with cbs.

For example, in data recovery in an area of errors on a hard drive, the most bytes will be recovered by using a small block size; for the greatest speed a large block size is chosen according to (a point of diminishing returns concerning) the system it runs on. If the transfer uses a network, dd can operate using a suitable block size depending on congestion levels.

Some implementations understand the letter x as a multiplication operator in the block size and count parameters:

dd bs=2x80x18b if=/dev/fd0 of=floppy.image

where the b suffix indicates that the units are 512-byte blocks. Unix block devices use this as their allocation unit by default.

For the value of bs field, following decimal number can be suffixed:

w means 2
b means 512
k means 1024
M specifies multiplication by 10242
G specifies multiplication by 10243

Hence bs=2x80x18b means 2 × 80 × 18 × 512 = 1474560 which is the exact size of a 1440 KiB floppy disk.

Progress Information

dd is a silent tool, which makes it very useful for scripting. However, to make its progress visible, use the following command on a GNU/Linux machine. In a different terminal obtain the process ID of the dd process by doing

 ps -a

You may get a output like
18255 pts/5 00:00:00 ssh
24084 pts/2 00:00:04 dd
24334 pts/4 00:00:00 ps

To send a USR1 signal to dd, continue the following:

 sudo kill -USR1 24084

In the terminal where dd is running you will see its output, something like:
349389+0 records in
349389+0 records out
1431097344 bytes (1.4 GB) copied, 935.624 s, 1.5 MB/s
One can do this as many as times as required to see the continuous progress.

Data transfer

dd can duplicate data across files, devices, partitions and volumes. The data may be input or output to and from any of these; but there are important differences concerning the output when going to a partition. Also, during the transfer, the data can be modified using the conv options to suit the medium.

An attempt to copy the entire disk using cp may omit the final block if it is an unexpected length; whereas dd may succeed. The source and destination disks should have the same size.

Data Transfer forms of dd
dd if=/dev/sr0 of=myCD.iso bs=2048 conv=noerror,sync create an ISO disk image from a CD-ROM.
dd if=/dev/sda2 of=/dev/sdb2 bs=4096 conv=noerror Clone one partition to another
dd if=/dev/ad0 of=/dev/ad1 bs=1M conv=noerror Clone a hard disk "ad0" to "ad1".

The noerror option means to keep going if there is an error. The sync option means to pad the output blocks.

Master boot record

It is possible to repair a master boot record. It can be transferred to and from a repair file. To duplicate the first two sectors of a floppy drive:

 dd if=/dev/fd0 of=MBRboot.img bs=512 count=2

To create an image of the entire master boot record (including the partition table):

 dd if=/dev/sda of=MBR.img bs=512 count=1

To create an image of only the boot code of the master boot record (without the partition table):

 dd if=/dev/sda of=MBR_boot.img bs=446 count=1

Data modification

dd can modify data in place.

Overwrite the first 512 bytes of a file with null bytes:

 dd if=/dev/zero of=path/to/file bs=512 count=1 conv=notrunc

The notrunc conversion option means do not truncate the output file — that is, if the output file already exists, just replace the specified bytes and leave the rest of the output file alone. Without this option, dd would create an output file 512 bytes long.

To duplicate a disk partition as a disk image file on a different partition:

 dd if=/dev/sdb2 of=partition.image bs=4096 conv=noerror

Disk wipe

For security reasons, it is necessary to have a disk wipe of the discarded device.

To check to see if a drive has data on it, send the output to standard out.

 dd if=/dev/sda 

To wipe a disk, first, consider the operation that would create a 1 GiB file containing only zeros (bs specifies block size, count the number of blocks):

 dd if=/dev/zero of=file1G.tmp bs=1M count=1024

The count option is the number of reads to be done by dd. Multiplying 1M times 1024 gives 1 GiB.

Now here are ways to use dd to wipe a disk:

 dd if=/dev/urandom of=/dev/hda # wipe an entire disk with random data
 dd if=/dev/zero of=/dev/sda # zero out a drive:

The output may be piped to various other Unix utilities in order to facilitate the report.

Data recovery

The history of open-source software (OSS) for data recovery and restoration of files, drives, and partitions started with GNU dd in 1984, with one block size per dd process, and no recovery algorithm other than the user's interactive session running one form of dd after another. Then a C program was authored Oct. 1999 called dd_rescue. It has two block sizes in its algorithm. But the author of the 2003 shell script dd_rhelp that enhances dd_rescue's data recovery algorithm, now recommends GNU ddrescue[6], a C++ program that published in 2004 and is now in most Linux distributions. GNU ddrescue has the most sophisticated block-size-changing algorithm available in OSS.[7] (The names ddrescue and dd_rescue are similar, yet they are different programs. Still, the Debian Linux distribution packages dd_rescue as "ddrescue", and packages the GNU ddrescue as "gdrescue" or as "gddrescue").

GNU ddrescue is stable and safe.[8] Here is an untested rescue using 3 of ddrescue's 24 options:

admin$> ddrescue -n /dev/old_disk /dev/new_disk # quickly grab large error-free areas, then stop
admin$> ddrescue -d -r1 /dev/old_disk /dev/new_disk # work with direct disk access on error areas

Another open source program called savehd7 uses a sophisticated algorithm, but it also requires the installation of its own programming-language interpreter.

Miscellaneous uses

To make drive benchmark test and analyze the sequential read and write performance for 1024 byte blocks :

 dd if=/dev/zero bs=1024 count=1000000 of=file_1GB
 dd if=file_1GB of=/dev/null bs=64k

To make a file of 100 random bytes:

 dd if=/dev/urandom of=myrandom bs=100 count=1

To convert a file to uppercase:

 dd if=filename of=filename1 conv=ucase

Create a 1 GiB sparse file or resize an existing file to 1 GiB without overwriting:

 dd if=/dev/zero of=mytestfile.out bs=1 count=0 seek=1G

Limitations

Seagate documentation warns, "Certain disc utilities, such as DD, which depend on low-level disc access may not support 48-bit LBAs until they are updated."[9] Using ATA harddrives over 128 GiB requires 48-bit LBA. However, in Linux, dd uses the kernel to read or write to raw device files.[10] Support for 48-bit LBA has been present since version 2.4.23 of the kernel.[11]

It is jokingly said that dd stands for "disk destroyer", "data destroyer", "death and destruction", or "delete data", since when used for low-level operations on hard disks, a small mistake, such as reversing the if and of (input and output) parameters, could result in the loss of some or all data on a disk.[2]

See also

References

  1. ^ Bell Laboratories. "dd man page". http://www.orangetide.com/Unix/V7/usr/man/man1/dd.1. Retrieved 2009-02-25. 
  2. ^ a b Sam Chessman. "How and when to use the dd command?". CodeCoffee. http://www.codecoffee.com/tipsforlinux/articles/036.html. Retrieved 2008-02-19. 
  3. ^ "Dd - LQWiki". LinuxQuestions.org. http://wiki.linuxquestions.org/wiki/Dd. Retrieved 2008-02-19. 
  4. ^ a b Eric S. Raymond. "dd". http://www.catb.org/jargon/html/D/dd.html. Retrieved 2008-02-19. 
  5. ^ See this old discussion "The Unix "dd" command". alt.folklore.computers. http://www.djmnet.org/lore/dd-origin.txt. Retrieved 2011-07-05. 
  6. ^ "dd_rhelp author's repository". 19 September 2011. http://www.kalysto.org/utilities/dd_rhelp/index.en.html. "Important note : For some times, dd_rhelp was the only tool (AFAIK) that did this type of job, but since a few years, it is not true anymore : Antonio Diaz did write a ideal replacement for my tool : GNU 'ddrescue'." 
  7. ^ "Damaged Hard Disk". www.cgsecurity.org. http://www.cgsecurity.org/wiki/Damaged_Hard_Disk. Retrieved 2008-05-20. 
  8. ^ "Interview with GNU ddrescue's Antonio Diaz Diaz". Blue-GNU. Archived from the original on 2008-04-15. http://web.archive.org/web/20080415135125/http://blue-gnu.biz/content/interview_gnu_ddrescue_039_s_antonio_diaz_diaz. Retrieved 2008-12-06. 
  9. ^ Windows 137GB (128 GiB) Capacity Barrier - Seagate Technology (March 2003)
  10. ^ This is verifiable with strace.
  11. ^ "ChangeLog-2.4.23". www.kernel.org. http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.23. Retrieved 2009-12-07. 

External links